Object Storage

What is Object Storage
Why Not Traditional Databases
How Object Storage Works
Key Design Principles
System Design Best Practices
Pre-signed URLs
Multi-part Upload
Popular Object Storage Services
Use Cases
Interview Questions

What is Object Storage

Definition

Object Storage is a specialized storage architecture designed for managing large files, commonly referred to as Binary Large Objects (BLOBs). While not technically a database, it functions as a database specifically optimized for storing and retrieving large, static files.

What Qualifies as a BLOB?

Images and Photos: Profile pictures, product images, thumbnails
Videos: User-generated content, streaming media, recorded sessions
Audio Files: Music, podcasts, voice recordings
Documents: PDFs, presentations, large text files
Data Files: JSON exports, CSV files, log files
Static Assets: CSS, JavaScript, fonts, icons

Core Characteristics

File-based Storage: Stores complete files as atomic units
Flat Namespace: No hierarchical folder structure (despite UI appearances)
Immutable: Files cannot be modified, only replaced or versioned
Highly Durable: 99.999999999% (11 9's) durability through redundancy
Scalable: Handles petabytes of data across distributed infrastructure
Cost-Effective: Optimized for storage costs rather than compute

Why Not Traditional Databases

The Problem with Storing BLOBs in Relational Databases

Storage Inefficiency:

PostgreSQL Example:
- Packs data into 8KB pages
- 4MB image = 500 pages
- Massive overhead for simple queries

Performance Impact

Query Performance Degradation:

-- Simple query becomes expensive
SELECT TOP 50 users
FROM users;
-- Database must manage megabytes of image data
-- Even when you only need user metadata

Issues Created:

Memory Pressure: Large files consume excessive RAM
Slow Queries: Simple operations become resource-intensive
Cache Pollution: BLOBs fill up database cache inefficiently

Replication Problems

Bandwidth Consumption:

4MB image replicated to 3 database replicas = 12MB per write
Massive bandwidth usage
Increased replication lag
Higher infrastructure costs

Backup and Recovery Issues

Backup Bloat:

Database backups include all BLOB data
What should be minutes becomes hours
Recovery time dramatically increased
Storage costs for backups skyrocket

Real-World Scenario:

Without Object Storage:
Database backup: 500GB (400GB are images)
Restore time: 8 hours

With Object Storage:
Database backup: 100GB (metadata only)
Restore time: 30 minutes

How Object Storage Works

High-Level Architecture

Client Request → Metadata Service → Storage Nodes → Stream Response
     ↓               ↓                    ↓              ↓
   "Get file1"    Index Lookup        Server A      Direct streaming
                     ↓
                 "File1 on Server A"

Core Components

1. Storage Nodes

Cheap commodity servers storing files on disk
Distributed across multiple racks and data centers
Optimized for throughput rather than low latency

2. Metadata Service

Central index mapping file identifiers to storage locations
Fast lookup service (usually in-memory)
Handles routing and load balancing

3. Redundancy Layer

Files stored on multiple servers (typically 3+ copies)
Erasure coding or full replication
Automatic healing when nodes fail
Cross-datacenter replication for disaster recovery

Request Flow

Client requests file by unique identifier
Metadata service performs index lookup
Storage location identified (e.g., Server A)
Direct streaming from storage node to client
Redundancy ensures availability if primary fails

Key Design Principles

1. Flat Namespace

Traditional File System:

/users/photos/2024/january/profile_pics/user123.jpg

Object Storage:

user-photos-2024-01-user123.jpg

Benefits:

Direct lookup without tree traversal
Faster access - O(1) instead of O(log n)
Simpler implementation and maintenance
UI sugar can simulate folders for user experience

2. Immutable Writes

Traditional Database: Update existing records

UPDATE users SET profile_image = 'new_image.jpg' WHERE id = 123;

Object Storage: Create new versions or overwrite

PUT /bucket/user123-profile-v2.jpg

Advantages:

No locks required - eliminates race conditions
Simpler concurrency model
Version control capabilities
Better performance without locking overhead

3. Redundancy and Durability

Replication Strategy:

File "user123.jpg" exists on:
- Server A (Primary)
- Server B (Replica 1)
- Server C (Replica 2)
- Server D (Cross-DC replica)

Durability Guarantees:

11 9's durability: 99.999999999%
Automatic failure recovery
Cross-datacenter redundancy
Background data integrity checks

System Design Best Practices

1. Hybrid Storage Pattern

Correct Approach:

Database (PostgreSQL/MySQL):
├── User metadata (ID, name, email, created_at)
├── Post metadata (ID, title, text, user_id)
└── File references (file_url, file_size, file_type)

Object Storage (S3):
├── Profile images
├── Post photos/videos
└── User uploads

Example Schema:

-- Store metadata in database
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    title VARCHAR(255),
    content TEXT,
    image_url VARCHAR(500), -- Reference to S3
    created_at TIMESTAMP
);

-- Files stored in S3: s3://bucket/posts/user123/post456.jpg

2. Common Architecture Pattern

Client → API Server → Database (metadata)
   ↓                      ↓
   └── Object Storage ← File URL

Flow Example:

Client requests social media feed
API server queries database for posts metadata
Database returns post data with S3 URLs
Client downloads images directly from S3

3. Metadata vs File Storage

Store in Database:

File metadata (size, type, upload date)
User permissions and access controls
File relationships and associations
Search indices and tags

Store in Object Storage:

Actual file bytes
Multiple file versions
Thumbnails and processed variants
Archive and backup copies

Pre-signed URLs

The Problem

Inefficient File Upload:

Client → Server → Object Storage
  4MB      4MB        4MB
  ↑         ↑          ↑
Bandwidth  Server     Final
consumed   load       destination

The Solution

Direct Upload with Pre-signed URLs:

1. Client requests upload permission
   Client → Server: "I want to upload user123.jpg"

2. Server requests pre-signed URL
   Server → S3: "Give me upload URL for user123.jpg, valid 1 hour"

3. S3 returns pre-signed URL
   S3 → Server: "https://bucket.s3.amazonaws.com/user123.jpg?signature=..."

4. Client uploads directly
   Client → S3: Direct upload using pre-signed URL

Implementation Example

Server-side (generating pre-signed URL):

# Python example
def generate_upload_url(filename, file_type):
    presigned_url = s3_client.generate_presigned_url(
        'put_object',
        Params={
            'Bucket': 'my-bucket',
            'Key': filename,
            'ContentType': file_type
        },
        ExpiresIn=3600  # 1 hour
    )
    return presigned_url

Client-side (using pre-signed URL):

// JavaScript example
const uploadFile = async (file, presignedUrl) => {
  const response = await fetch(presignedUrl, {
    method: 'PUT',
    body: file,
    headers: {
      'Content-Type': file.type,
    },
  });
  return response.ok;
};

Benefits

Reduced server bandwidth - no proxy through application server
Better scalability - server doesn't handle large file processing
Faster uploads - direct connection to object storage
Security - temporary, scoped permissions
Cost savings - reduced data transfer costs

Multi-part Upload

The Problem

File Size Limitations:

HTTP POST/PUT limits (typically 5MB for S3)
Browser upload limits
Gateway and proxy limitations
Network timeout constraints for large files

The Solution

Chunked Upload Process:

Large File (1GB)
    ↓
Split into chunks (5MB each)
    ↓
Upload chunks in parallel
    ↓
Object storage reassembles

Multi-part Upload Flow

Initiate Upload:

Client → S3: "I want to upload 1GB file"
S3 → Client: "Upload ID: abc123, use 5MB chunks"

Upload Chunks:

Chunk 1 (5MB) → S3 → Part 1 ETag
Chunk 2 (5MB) → S3 → Part 2 ETag
Chunk 3 (5MB) → S3 → Part 3 ETag
... (parallel uploads)
Chunk 200 (5MB) → S3 → Part 200 ETag

Complete Upload:

Client → S3: "Complete upload abc123 with parts [ETag1, ETag2, ...]"
S3 → Client: "Upload complete, file assembled"

Implementation Benefits

Parallel uploads - faster overall transfer
Resumable uploads - retry individual chunks on failure
Better reliability - smaller chunks less likely to fail
Progress tracking - granular upload progress
Bandwidth optimization - can adjust chunk size

Example Architecture

Client Application
├── File chunking logic
├── Parallel upload management
├── Progress tracking
└── Error retry mechanism
    ↓
Object Storage
├── Multi-part upload API
├── Chunk validation
├── Assembly service
└── Cleanup of incomplete uploads

Popular Object Storage Services

Amazon S3 (Simple Storage Service)

Market Leader:

Most widely used and documented
Default choice for system design interviews
Extensive feature set and integrations
Global availability

Key Features:

Pre-signed URLs for secure access
Multi-part upload (5MB chunk limit)
Storage classes for cost optimization
Cross-region replication
Event notifications

Google Cloud Storage

Google's Offering:

Similar features to S3
Strong integration with Google Cloud Platform
Competitive pricing
Multi-regional storage options

Azure Blob Storage

Microsoft's Solution:

Integrated with Azure ecosystem
Hot, cool, and archive storage tiers
Strong enterprise adoption
Similar API patterns to competitors

Common Features Across All

Pre-signed/Signed URLs for secure access
Multi-part upload capabilities
Versioning and lifecycle management
Encryption at rest and in transit
Access controls and permissions
CDN integration for global distribution

Use Cases

Architecture Example:

User Posts → Metadata in Database → Photos/Videos in S3
         ↓                      ↓
    Post feed API         Direct download URLs

Components:

User-generated content (photos, videos)
Profile images and cover photos
Story content and highlights
Live streaming archives

Examples:

Dropbox-like services: File storage and synchronization
Design tools: Large design files and assets
Document management: PDFs, presentations, spreadsheets

Pattern:

File Upload → Pre-signed URL → Direct S3 Upload
File Sharing → Signed URL → Direct S3 Download

3. Web Application Assets

Static Content Delivery:

CSS and JavaScript files
Images and icons
Fonts and media assets
Usually fronted by CDN for global distribution

Architecture:

Web App → CDN → Object Storage
             ↓
        Global edge locations

4. Data Processing and Analytics

Big Data Storage:

Log files: Application logs, server logs, audit trails
ML training data: Large datasets for machine learning
Data exports: Database dumps, report files
Backup archives: System backups and snapshots

5. Media and Entertainment

Content Storage:

Video streaming libraries
Music catalogs
Podcast archives
Image galleries
360-degree content and VR assets

Interview Questions

1. "Why would you use object storage instead of a traditional database for storing images?"

Answer Framework:

Performance: Traditional databases aren't optimized for large files
Scalability: Object storage scales horizontally with lower costs
Efficiency: Reduces database backup size and replication overhead
Specialization: Purpose-built for file storage with features like pre-signed URLs

System Design Approach:

Users upload photos:
Client gets pre-signed URL from API server
Client uploads directly to S3
API server stores metadata in database
Feed requests return metadata + S3 URLs
Client downloads images directly from S3

Key Components:

Database for post metadata and user data
S3 for actual image storage
CDN for global image delivery
Image processing service for thumbnails

3. "What are pre-signed URLs and when would you use them?"

Explanation:

Temporary URLs with embedded authentication
Use cases: Secure uploads, private file access, reducing server load
Benefits: Direct client-to-storage communication, better performance
Security: Time-limited, scope-limited permissions

4. "How do you handle uploading very large files (>1GB)?"

Multi-part Upload Strategy:

Split large files into chunks (typically 5MB)
Upload chunks in parallel for better performance
Handle chunk failures independently
Reassemble on object storage side
Provide progress tracking and resumability

5. "Compare object storage with a traditional file system"

Key Differences:

Aspect	Object Storage	Traditional File System
Namespace	Flat	Hierarchical
Scalability	Horizontal	Vertical
Durability	11 9's with replication	Depends on RAID setup
Access	HTTP REST API	File system calls
Consistency	Eventually consistent	Strongly consistent
Cost	Pay per GB stored	Fixed infrastructure

6. "How would you implement a file upload feature for a web application?"

Implementation Steps:

Client requests upload: Send file metadata to server
Server validation: Check file type, size, permissions
Generate pre-signed URL: Request from S3 with expiration
Direct upload: Client uploads to S3 using pre-signed URL
Metadata storage: Server stores file reference in database
Confirmation: Return success response with file URL

7. "What are the trade-offs of using object storage?"

Advantages:

Massive scalability and durability
Cost-effective for large files
Built-in redundancy
Global accessibility

Disadvantages:

Eventually consistent (in some cases)
No file modification capabilities
API overhead for small operations
Network dependency for access

8. "Design a system to handle 1 million image uploads per day"

Architecture Considerations:

Load balancing: Distribute pre-signed URL requests
Horizontal scaling: Multiple API servers
Database optimization: Efficient metadata storage
Monitoring: Track upload success rates and performance
Error handling: Retry mechanisms and cleanup processes
Security: Rate limiting and access controls

Table of Contents​

What is Object Storage​

Definition​

What Qualifies as a BLOB?​

Core Characteristics​

Why Not Traditional Databases​

The Problem with Storing BLOBs in Relational Databases​

Performance Impact​

Replication Problems​

Backup and Recovery Issues​

How Object Storage Works​

High-Level Architecture​

Core Components​

1. Storage Nodes​

2. Metadata Service​

3. Redundancy Layer​

Request Flow​

Key Design Principles​

1. Flat Namespace​

2. Immutable Writes​

3. Redundancy and Durability​

System Design Best Practices​

1. Hybrid Storage Pattern​

2. Common Architecture Pattern​

3. Metadata vs File Storage​

Pre-signed URLs​

The Problem​

The Solution​

Implementation Example​

Benefits​

Multi-part Upload​

The Problem​

The Solution​

Multi-part Upload Flow​

Implementation Benefits​

Example Architecture​

Popular Object Storage Services​

Amazon S3 (Simple Storage Service)​

Google Cloud Storage​

Azure Blob Storage​

Common Features Across All​

Use Cases​

1. Social Media and Content Platforms​

2. Collaborative Tools and File Sharing​

3. Web Application Assets​

4. Data Processing and Analytics​

5. Media and Entertainment​

Interview Questions​

1. "Why would you use object storage instead of a traditional database for storing images?"​

2. "How would you design a photo-sharing application's storage architecture?"​

3. "What are pre-signed URLs and when would you use them?"​

4. "How do you handle uploading very large files (>1GB)?"​

5. "Compare object storage with a traditional file system"​

6. "How would you implement a file upload feature for a web application?"​

7. "What are the trade-offs of using object storage?"​

8. "Design a system to handle 1 million image uploads per day"​

Table of Contents

What is Object Storage

Definition

What Qualifies as a BLOB?

Core Characteristics

Why Not Traditional Databases

The Problem with Storing BLOBs in Relational Databases

Performance Impact

Replication Problems

Backup and Recovery Issues

How Object Storage Works

High-Level Architecture

Core Components

1. Storage Nodes

2. Metadata Service

3. Redundancy Layer

Request Flow

Key Design Principles

1. Flat Namespace

2. Immutable Writes

3. Redundancy and Durability

System Design Best Practices

1. Hybrid Storage Pattern

2. Common Architecture Pattern

3. Metadata vs File Storage

Pre-signed URLs

The Problem

The Solution

Implementation Example

Benefits

Multi-part Upload

The Problem

The Solution

Multi-part Upload Flow

Implementation Benefits

Example Architecture

Popular Object Storage Services

Amazon S3 (Simple Storage Service)

Google Cloud Storage

Azure Blob Storage

Common Features Across All

Use Cases

1. Social Media and Content Platforms

2. Collaborative Tools and File Sharing

3. Web Application Assets

4. Data Processing and Analytics

5. Media and Entertainment

Interview Questions

1. "Why would you use object storage instead of a traditional database for storing images?"

2. "How would you design a photo-sharing application's storage architecture?"

3. "What are pre-signed URLs and when would you use them?"

4. "How do you handle uploading very large files (>1GB)?"

5. "Compare object storage with a traditional file system"

6. "How would you implement a file upload feature for a web application?"

7. "What are the trade-offs of using object storage?"

8. "Design a system to handle 1 million image uploads per day"